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ABSTRACT 

This paper empirically and systematically assessed 
the performance of bootstrap resampling procedure as it was applied 
to a regression model. Parameter estimates from Monte Carlo 
experiments (repeated sampling from population) and bootstrap 
experiments (repeated resampling from one original bootstrap sample) 
were generated and compared. Sample sizes of 20, 30, 50, and 100 were 
considered in the simulation. Ten independent Monte Carlo experiments 
and 10 independent bootstrap experiments were conducted respectively 
for each sample size condition, with 1,000 samples (resamples for 
bootstrap) for each experiment. Estimates for standardized regression 
coefficients were obtained from each sample, and the mean estimates 
across samples were evaluated in relation to the population 
parameters. The results indicate that, as the number of resamples 
increases, the mean bootstrapped estimates did not show a clear 
tendency to converge on the population parameters. But, with the 
increase of uhe original bootstrap sample size, the quality of the 
bootstrapped estimates improved. For the case of regression analysis, 
the results raise some concern about the validity of the assumption 
underlying the bootstrap procedure. (Contains 27 references and 9 
figures.) (Author) 
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Abstract 

This paper empirically and systematically assessed the 
I srformance of bootstrap resampling procedure as it was applied 
to a regression model. Parameter estimates from Monte Carlo 
experiments (repeated sampling from population) and bootstrap 
experiments (repeated resampling from one original bootstrap 
sample) were generated and compared. Sample sizes of 20, 30, 50, 
and 100 were considered in the simulation. Ten independent Monte 
Carlo experiments and ten independent bootstrap experiments were 
conducted respectively for each sample size condition, with 1,000 
samples (resamples for bootstrap) for each experiment. Estimates 
for standardized regression coefficients were obtained from each 
sample, and the mean estimates across samples were evaluated in 
relation to the population parameters. The results indicate 
that, as the number of resamples increases, the mean bootstrapped 
estimates did not show a clear tendency to converge on the 
population parameters. But, with the increase of the original 
bootstrap sample size, the quality of the bootstrapped estimates 
improved. For the case of regression analysis, the results raise 
some concern about the validity of the assumption underlying the 
bootstrap procedure. 
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Background 

In educational and psychological research, the overreliance 
on statistical significance testing has been challenged on 
several grounds, including issues related to sample size and to 
the validity of theoretical assumptions underlying parametric 
statistical techniques (Carver, 1978; Shaver, 1993? Thompson, 
1989) . The sample size issue becomes prominent due to the fact 
that any null hypothesis can be rejected (statistically 
significant) when sample size is large enough, and the importance 
of statistical significance tends to be greatly exaggerated in 
research practice. As to the assumptions required for parametric 
statistical techniques, often, these assumptions are difficult, 
if not impossible, for research practitioners to meet or assess. 

To avoid the blind reliance on statistical significance 
testing, some researchers have turned to methods which are more 
empirically grounded. Bootstrap procedure, which is computing- 
intensive in nature, has become prominent in recent years as a 
complement to the traditional statistical significance testing, 
or an alternative approach to making statistical decisions 
(Thompson, 1993) . Instead of relying on the theoretical sampling 
distribution and sample sizes, bootstrap procedure, through 
repeated resampling with replacement from the original sample, 
empirically generates estimated sampling distribution, upon which 
our statistical decisions can be based (Diaconis & Efron, 1983; 
Efron, 1979; Lunneborg, 1990; Thompson, 1992). In this sense, 
bootstrap procedure attempts to avoid the pitfalls associated 
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with the traditional statistical significance testing, such as 
the sample size issue, the concern for the validity of the 
theoretical assumptions for our data in hand, etc. Since its 
debut in the late 70's (Efron, 1979), bootstrap method has 
gradually attracted the attention of the researchers in the 
educational and psychological research arena (Lunneborg, 1983; 
Lunneborg, 1987) . With increasingly easier access to powerful 
computing facilities, this method becomes more attractive than 
before. 

Researchers in the educational and psvchological research 
arena have applied the bootstrap technique to a variety of 
research problems, ranging from measurement issues, such as item 
discrimination and item bias indices, to multivariate statistical 
techniques, such as principal component analysis, factor 
analysis, and structural equation modeling (Bentler, 1992; 
Daniel, 1992; Karris & Kolen, 1988, 1989; Lambert, Wildt & 
Durand, 1990, 1991; Mendoza, Hart & Powell, 1991; Thompson, 1988; 
Thompson & Melancon, 1990) , Some researchers also provided 
theoretical rationale or simulation results attesting to the 
applicability of this procedure to some widely used statistical 
methods (Bickel & Freedman, 1981; Freedman, 1981; Wu, 1986). 

Though bootstrap procedure is promising as a complement or 
an alternative to the traditional statistical significance 
testing, it may have its own weaknesses which have not been 
adequately investigated empirically. One potential weakness was 
pointed out by Bollen and Stine (1993) in the area of structural 
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equation modeling research, which indicated that conventional 
bootstrap resampling might fail to generate the intended 
empirical distributions for some sample statistics. Another 
example was from a study by Tryon (1984) which showed that 
bootstrapped estimates failed to converge on such basic 
population parameters as means and standard deviations. 

One potential problem with bootstrap procedure may be 
related to the underlying assumption of bootstrap resampling 
itself. Bootstrap resampling solely relies on one original 
sample. In the argument of Diaconis and Efron (1983) , the 
underlying assumption for bootstrap procedure was that, large 
sample or population results could be approached or reconstructed 
by repeated random resampling from a small sample. (It is in the 
sense of using this sample at hand and pulling oneself up by 
one's own bootstraps that the procedure acquired its name.) As 
empirical support for the validity of this assumption, Diaconis 
and Efron (1983) empirically demonstrated that the population 
correlation between Grade Point Average (GPA) and average Law 
School Admission Test (LSAT) for all 82 American law schools in 
197 3 could be approached by repeated random resampling from a 
bootstrap sample of 15 law schools. Lunneborg (1983) stated chis 
assumption more clearly, "The bootstrap conjecture is that the 
sampling distribution of the statistic being studied and the 
sampling distribution found from this iterative process are 
essentially identical for a wide variety of statistics. n (p. 1) 

The underlying assumption that, through repeated resampling, 
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enough information can be extracted from a small sample to 
reconstruct large sample or population results, may not have been 
subjected to rigorous empirical scrutiny. In sampling, even 
assuming that sound sampling techniques are used, a particular 
sample may have its own idiosyncracies due to sampling 
fluctuations. If random samples are repeatedly drawn from a 
population , these sampling idiosyncracies tend to be cancelled 
out. In bootstrap resampling, however, repeated random samples 
are drawn from one original sample, and potentially, we may be 
capitalizing on those sampling idiosyncracies associated with our 
particular sample. Consequently, the resultant picture of the 
empirical sampling distribution provided by bootstrap procedure 
may be a distorted one (Fan, 1993). 

Monte carlo simulation and Bootstrap Resampling 

In this study, a clear distinction is made between Monte 
Carlo simulation and bootstrap resampling. In Monte Carlo 
experiment, new random samples are repeatedly drawn from a 
population with known parameters. From each new sample, the 
statistic is obtained as the estimate of the population 
parameter. The performance of these statistics from all the 
random samples are then examined relative to the known population 
parameter. In bootstrap experiment, however, one original random 
sample is drawn from a population with known parameters. Random 
samples are then repeatedly drawn from this original bootstrap 
sample using the technique of sampling v/ith replacement. From 



7 



Bootstrap for Regression Analysis 5 



each resampling, the statistic for the population parameter is 
obtained. The performance of these statistics over all the 
resamples are then examined relative to the known population 
parameter. 

Monte Carlo experiment represents the classical approach for 
estimating probability of certain event. The underlying 
assumption for this approach is obvious: sampling fluctuations 
tend to be cancelled out over repeated random sampling from the 
population, and as the number of samples increases, the mean 
statistic over the samples will converge on the population 
parameter. An intuitive example may help shed some light on this 
logic. Suppose we know that for a good and even coin, the 
likelihood of obtaining either the head or the tail from each 
flip is 0.5. In an empirical experiment to check one particular 
coin, we flip it 10 times. We may not obtain five heads and five 
tails exactly due to sampling f luctuations, and probably we will 
NOT be surprised to see eight heads and two tails. But, if we 
repeat our experiment 1000 times with 10 flippings in each 
experiment, we have reason to believe that the average number of 
heads and tails over 1000 repeated experiments will approach the 
population value of 5, instead 8 or 2 as in the first experiment. 
If the average numbers of heads and tails over 1000 experiments 
turn out to be 8 and 2 respectively, we may become suspicious 
about the quality of our coin, because over repeated sampling, a 
good and even coin is highly unlikely to give us estimate so far 
off from the theoretical population parameters. 
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If we have a coin which has been proven previously to be 
good and even, and we want to test a new coin-flipping c'avice 
(sampling procedure) . If, from our repeated sampling, the 
estimate for the probability of head or tail to occur converges 
on values of, say, 0.2 or 0,8, rather than on the known parameter 
of 0.5, then we have reason to call into question the integrity 
of our new flipping device itself. 

The purpose of the present study is to examine empirically 
the performance of the bootstrap resampling procedure as applied 
to regression analysis. For this purpose, computer simulation is 
used to examine the characteristics of standardized regression 
coefficient estimates from both the Monte Carlo and the bootstrap 
experiments. 

Methods 

To assess the performance of bootstrap resampling, bootstrap 
experiments were conducted, and the estimates from the bootstrap 
experiments were compared with the known population parameters, 
and with those from Monte Carlo experiments* Since sample size 
may affect how well a statistic converges on its parainew^r, 
several sample size conditions were considered in the study. 

A regression model of one dependent variable (Y) regressing 
on two independent variables (XI, X2) was used in the simulation. 
The population correlations among the three variables (Y, XI, X2) 
are specified as follows: 
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Through the procedures proposed by Kaiser and Dickman (1962), 
samples were generated from the population with the inter- 
correlations as specified above. 

Since the population intercorrelations are known for the 
variables in the regression model below, the standardized 



Y = Jtp + c 

regression coefficients for the population (population 
parameters) are fully specified to be (Johnson & Wichern, 1988) 
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Four sample size conditions of 20, 30, 50, and 100 were used 
in the simulation. In order to reduce the likelihood for 
haphazard chance discovery, ten independent Monte Carlo 
experiments and ten independent bootstrap experiments were 
conducted for each sample size condition. (Total number of Monte 
Carlo experiments: 4xio=40; total number of bootstrap 
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experiments: 4xio~4 0.) For each experiment (Monte Carlo or 
bootstrap), 1,000 samples (for bootstrap experiment, 1,000 
resamples) were drawn and sample standardized regression 
coefficients were obtained from each sample. Altogether, for 
each sample size condition, a total of 10,000 samples were 
generated for Monte Carlo and bootstrap procedures respectively. 
A schematic representation of the study for one sample size 
condition is presented in Figure 1. 

Since the individual samples (for Monte Carlo experiments) 
and the original bootstrap samples (one for each bootstrap 
experiment) were randomly drawn from a population with known 
parameters (standardized regression coefficients) , the estimates 
for the parameters obtained from these samples were expected to 
converge on the parameters as the number of replications 
increased. The failure in this respect was indication of 
problems with sampling procedures. In other words, if the 
sampling procedure works the way as it should, logic dictates 
that, the mean statistic based on ten samples will tend to be 
closer to the parameter than the statistic based on a single 
sample. In the same vein, the mean statistic over 100 samples 
will tend to be closer to the parameter than that over 10 
samples, etc. As the number of samples increases, the mean 
statistic will tend to approach the population parameter, or 
converge on the parameter. 
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Bootstrap Experiments 

For each of the four sample size conditions (n=20, 30, 50, 
100) , ten independent bootstrap experiments were conducted . For 
each experiment, first, one original bootstrap sample was drawn 
from the population with the known para!.ioters (standardized 
regression coefficients). From this one bootstrap sample, 1,000 
resamples were drawn by sampling with replacement. From each 
bootstrap resample, the sample standardized regression 
coefficients for the two independent variables (Betal for XI, and 
Beta2 for X2) were obtained. So for each independent bootstrap 
experiment, a total of 1,000 estimates were obtained for Betal 
(XI) and Beta2 (X2) respectively. 

As reasoned above, unbiased estimate for population 
parameter should possess the property of convergence on 
population parameter over repeated sampling. Thus, as a general 
tendency, the mean statistic based on several samples is expected 
to be closer to the parameter than the statistic based on a 
single sample; and the mean statistic based on a large number of 
samples is expected to be closer to the parameter than that based 
on a small number of samples. To check this important property 
expected for unbiased estimate; for each bootstrap experiment , 
the following indices were obtained: 

1) the parameter estimate from the first bootstrap resample; 

2) the mean parameter estimate based on the first ten 
bootstrap resamples; 

3) the mean parameter estimate based on the first 100 
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bootstrap resamples ; 

4) the mean parameter estimate based on the first 500 
bootstrap resamples ; and 

5) the mean parameter estimate over all the 1,000 bootstrap 
resamples. 

These indices were compared with the known population 
parameters to see how well these statistics would converge on the 
population parameters . 

Monte Carlo Experiments 

Monte Carlo experiments were conducted in the same fashion 
as the bootstrap experiments. The only difference, as explained 
previously, is in how th«.* samples were drawn. For each 
independent Monte Carlo experiment, the same statistics were 
obtained, and the same five indices were calculated to check how 
well the statistics converged on the population parameters: 

1) the parameter estimate from the first Monte Carlo sample; 

2) the mean parameter estimate over the first ten Monte 
Carlo samples; 

3) the mean parameter estimate over the first 100 Monte 
Carlo samples; 

4) the mean parameter estimate over the first 500 Monte 
Carlo samples ; and 

5) the mean parameter estimate over all the 1,000 Monte 
Carlo samples. 
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Data Generation and Calculation of Statistics 

All data generation, sampling (including sampling with 
replacement for bootstrap) , and calculations were accomplished by 
using the Interactive Matrix Language (PROC IML) under the 
Statistical Analysis System (SAS) . Random normal samples were 
generated by using the random number generator for normal 
distribution (RANNOR under SAS) . Random sampling with 
replacement for bootstrap procedure v/as accomplished by 
generating a vector of random numbers using random number 
generator for uniform distribution (RANUNI under SAS) . Each 
element of the vector was independently generated (to accomplish 
the feature of "with replacement 11 required by bootstrap) , and 
constrained to be integers between 1 and m, with m being the 
original bootstrap sample size. This vector of integers was then 
used as the index numbers to draw row vectors (samples) from the 
original bootstrap sample data matrix of mx3 dimensions (m: 
original bootstrap sample size) . 

The sample intercorrelations among the three variables of Y, 
XI, and X2, as samples from a population with the specified 
intercorrelations as in Table 1, were accomplished through 
implementation of the procedures proposed by Kaiser and Dickman 
(1962). All calculations were accomplished using matrix language 
programming under PROC IML of SAS. For quality control purpose, 
before iterations began, results of every step of matrix language 
programming for calculations were compared, and found to be in 
agreement, with the results from regular SAS procedures such as 
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PROC REG, PROC MEANS, etc. The whole simulation process for both 
Monte Carlo and bootstrap experiments was accomplished by using 
SAS Version 6.08 for Microsoft Window on an IBh Computer with 
486DX 66 MHz CPU with built-in math co-processor . 

Results and Discussions 

Due to the huge number of sample estimates which makes it 
difficult to tabulate, fficult to tabulate, a graphic approach is used to 
data. Figure 2 to Figure 5 depict the convergence tendency of 
the estimates for the four sample size conditions respectively. 
In these figures, to check the property of convergence, the mean 
statistics over consecutively larger number of samples were 
plotted in relation to the parameter values: 

1) the regression coefficient from the first sample 
(resample for bootstrap) ; 

2) the mean regression coefficient of the first ten samples; 

3) the mean regression coefficient of the first 100 samples; 

4) the mean regression coefficient of the first 500 samples; 

5) the mean regression coefficient of all the 1,000 samples. 

The figures show that the estimates from Monte Carlo 
experiments (represented by "o 11 in the graphs) consistently 
converge on the population parameters as the number of samples 
increased, and this tendency is clear for both Betal and Beta2 , 
and for all sample size conditions. The estimates from bootstrap 
experiments (represented by 11 *" in the graphs) , however, do not 
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fare as well, since they do not seem to continue to approach 
parameter values as the number of resamples increases. 
Especially, when the original bootstrap sample size is small 
(e.g., n=20) , the divergence of bootstrap statistics from the 
population parameters seems to be very substantial. 

Although bootstrap estimates do not seem to exhibit the 
tendency to converge on population parameters, a closer 
examination of the figures reveals that, in terms of degree of 
divergence of the statistics from the population parameters, 
bootstrap estimates show clear improvement as the original 
bootstrap sample rsize increases. This is obvious when we compare 
Figure 2 (sample size: 20) with Figure 5 (sample size: 100). The 
improvement pattern .Is the samo for both Betal and Beta2. This 
indicates that, for the regression ivodel, larger sample size for 
the original bootstrap sample may be necessary in order to avoid 
potential excessive divergence of bootstrap estimates 1'rom the 
population parameter values. This may also partially explain the 
non-convergence patterns as exhibited by bootstrap estimates for 
means and standard deviations as reported in Tryon's study 
(Tryon, 1983) , since only small sample sizes (15 and 25) were 
used in the simulation in that study. It is possible that if 
larger sample sizes had been considered in that study, the 
Quality of bootstrap estimates would have improved. 

The fact- that the bootstrap estimates exhibit little 
tendency to converge cn population parameter value has some 
interesting implications. As explained by Diaconis and Efron 
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(1983), bootstrap procedure assumes that population results could 
be approached or reconstructed through repeated sampling from the 
small sample at hand. This assumption may have been accepted 
prematurely without rigorous empirical scrutiny. 

Conclusions 

Based on the results from these systematic simulation 
experiments, three tentative conclusions may be drawn with regard 
to the bootstrap resampling procedure. First, the assumption for 
bootstrap procedure, i.e., large sample or population results may 
be approached or reconstructed through intensive resampling of a 
small sample at hand, may not be capable of withstanding rigorous 
empirical test, and the validity of this assumption may be called 
into question. Although Diaconis and Efron (1983) conceded that 
the procedure might not work for a few samples, and one could not 
know in advance which they were, the results from this study 
indicate that their view about the applicability of the procedure 
still might have been overly sanguine. Especially considering 
the generality of regression analysis as general linear model 
which subsumes a variety of parametric tests , the somewhat 
disappointing performance of the procedure in this study may 
become a source for concern . 

Second , the bootstrap estimates for regression coefficients 
are biased to* a certain extent in the sense that they do not 
continue to approach population parameters with the increase of 
the number of resamples. In other words, in many cases, these 
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estimates may approach some value other than the population 
parameter, and it is not known what the value is. The divergence 
pattern of the bootstrap estimates is especially striking when 
the original bootstrap sample size was small. 

Last, there is clear indication that the quality of 
bootstrap estimates improves substantially with the increase of 
the original bootstrap sample size, since when sample size gets 
larger, the estimates tend to diverge much less from the 
population parameters. If this can be replicated for other types 
of statistical analysis, it may imply that it would be 
advantageous or even necessary to use larger bootstrap sample 
size so as to reduce the potential divergence of bootstrap 
estimates from the population parameters. 
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Figure 2: Mean Statistics and Parameter: Convergence Tendency for 
Betal (Sample Size 20) 
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Figure 3: Mean Statistics and Parameter: Convergence Tendency for 
Beta2 (Sample Size 20) 
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Figure 4: Mean Statistics and Parameter: Convergence Tendency for 
Betal (Sample Size 30) 
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Figure 5: Mean Statistics and Parameter: Convergence Tendency for 
Beta2 (Sample Size 30) 
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Figure 6: Mean Statistics and Parameter: Convergence Tendency for 
Betal (Sample Size 50) 



28 



Bootstrap for Regression Analysis 25 



Beta2 



'o 1 for Monte Carlo mean estimates 
for bootstrap mean estimates 



0.8 + 



0.6 



0.4 



o 
o 



o 
o 
o 



o 
o 



P 2 = 0.541 (Parameter) 



0.2 + 



10 100 500 1000 

No. of samples the mean statistics based on 



NOTE; 4 6 obs hidden. 



Figure 7: Mean Statistics and Parameter: Convergence Tendency for 
Beta2 (Sample Size 50) 
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Figure 8: Mean Statistics and Parameter: Convergence Tendency for 
Betal (Sample Size 100) 
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Figure 9: Mean Statistics and Parameter: Convergence Tendency for 
Beta2 (Sample Size 100) 
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